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Stereo vision 


Stereo vision is a popular method for an artificial vision-based environment 
perception system used in various applications such as intelligent 
transportation. With two cameras, the disparity map is calculated to find the 
distance and depth of objects in front of a moving vehicle. The key element 
of the stereoscopic system is based on the sum of absolute differences 
(SAD) algorithm, which is the most repeated operation in the stereo 
matching subsystem; however, this algorithm requires a very intensive 
processing time, statistical analysis show that the SAD block can consume 
more than 80% of the overall processing time of the algorithm. In this paper 
we propose a highly efficient hardware architecture of the SAD algorithm 
for real time stereo matching, the proposed architecture is established by a 
hierarchical parallel architecture of the SAD block, and verified by 
simulation and successfully implemented in Cyclone IV field programmable 
gate array (FPGA), it provides a significant reduction of processing time and 
the performance of the stereo imaging system is able to achieve 30 frames 
per second of 640x480 resolution color images. 
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1. INTRODUCTION 


Stereo vision is an imaging technique similar to the vision principle way of the human eyes. It involves 
using two cameras to generate a three dimensional image. The depth map can be generated by comparing the 
offset of homologous pixels from the two cameras. This depth map is a three dimensional representation of 
the real world. These systems are becoming more and more essential for several applications, such as mobile 
robot navigation, intelligent transportation. Intelligent vehicles and mobile robots can use stereo vision to 
improve the perception of the environment and consequently the performance of object detection, which is 
one of the challenges of these applications. 

The main idea of stereo vision systems is the estimation of the image depth obtained from a disparity 
map through stereo matching. The core of the stereo vision system is mainly composed of a computational 
block that compares the two images to find the objects that match and estimate the distance between the objects 
and the cameras. This comparison technique is based on several algorithms, such as the sum of absolute 
differences (SAD) and sum of square differences (SSD) algorithms [1] that find the homologous pixels. 
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The SAD algorithm measures the similarity between image blocks by taking the absolute difference of each 
pixel in the right image block and the corresponding pixel in the left image block. These differences are 
summed to create a simple measure of block similarity. This algorithm is based on three general approaches 
to computing the disparity map: local, global, and semi global algorithms as presented in the works [2]-[4]. 
Local algorithms try to find optimal disparities for small image regions with fixed windows using local 
differences; however, global algorithms apply a large computational time based on global cost optimization. 
Finally, the semi global algorithm incorporates the advantages of both local and global algorithms by 
achieving relatively low complexity. The analysis results of previous works on the complexity of these three 
approaches show that the local algorithms present a good compromise between the computational complexity 
and the real time processing constraints. 

In order to ensure the compromise between a higher computational complexity of the SAD 
algorithm and an accelerated processing time of the stereo vision system on embedded platforms, different 
optimization approaches have been performed in recent years. Many research works has been devoted to the 
implementation of the SAD algorithm using software and hardware approaches to achieve real time 
processing for the stereo vision systems. Most of the software implementations use the graphics processing 
units (GPU) processor which presents an efficient environment to implement the parallel processing of the 
SAD algorithm. For example, Lu et al. [4] propose a dynamic window algorithm implementation with GPU 
processor that achieves a speed of 17 fps. In another GPU processor based implementation, Gong et al. [5] 
propose a software implementation of the SAD algorithm based on the GPU processor, the result shows a 
processing performance of 10 fps. In Wang ef al. [6] a multiple obstacle detection and tracking system using 
stereo vision is developed to avoid collision in reel time. Other works [3], [7] present an implementation in 
real time of the stereo vision system based on the GPU processor. Another way to optimize the processing 
time of the SAD algorithm is to use a software implementation based on the digital signal processing (DSP) 
and acorn risc machine (ARM) processor. Lin and Chiu [8], propose a new approach to implement the stereo 
vision system based on the SAD algorithm and uses the DSP processor to speed up the difference sum 
calculation. Another software implementation based on the ARM processor with software parallelism of the 
SAD algorithm has been proposed in Sejai et al. [9]. 

However, due to the high computational complexity of the SAD algorithm, a hardware 
implementation on a reconfigurable circuit is required. Many hardware implementations have been proposed 
in the literature. A hardware implementation of the SAD algorithm on field programmable gate array (FPGA) 
components can accelerate the processing time of the stereo vision system by exploiting the most developed 
resources of FPGAs. Wang et al. [10] implemented a real time high quality stereo vision system in FPGA by 
using absolute difference census cost initialization, cross based cost aggregation, and semi global optimization, 
the system provides high quality depth results for high definition images. Seo et al. [11], propose a new hardware 
architecture for the implementation of the stereo vision system that reuses the intermediate results to minimize the 
processing time and memory access. Zha et al. [12] propose an improved global stereo matching algorithm that is 
implemented on a single FPGA for real time applications. The proposed system is a fully pipelined stereo vision 
system that provides a dense disparity image with real time. Kalomiros and Lygouras [13], [14] propose a system 
with two stereo accelerators implemented using reconfigurable hardware, the SAD algorithm and a dynamic 
programming technique is compared. Georgoulas and Andreadis [15] presente the parallel pipelined design and 
hardware implementation of a fuzzy inference system based real time stereo vision system. Gardel et al. [16], 
the SAD and SSD algorithms implemented in an FPGA were used to obtain a disparity map. On the other 
hand, Ferreira et al. [17] use a technique of background subtraction to detect the target object and create 
segmented images. Murphy et al. [18] use the census transform to calculate the disparity. In a more recent 
work from Manjunatha et al. [19] propose an FPGA implementation of the SAD algorithm for video 
applications. In Koshta et al. [20], absolute difference circuit implementationon FPGA is proposed to increase 
speed performance and to minimize the occupied resources on FPGA for SAD calculation. QianYu and Yi [21] 
a hardware platform based on FPGA is built of a target detection and tracking system, after analyzing SAD 
algorithm and cross search algorithm, this paper focuses on the design of parallel search structure of tracking 
module, parallel matching structure of search module and parallel computing structure of SAD unit. 

Based on a deep analysis of subsequent work on the stereo vision system, the calculation of 
disparity is complex and obtained using census transformation, SAD, SSD, or dynamic programming 
techniques. The mathematical operations must be performed efficiently to be used in real time stereo vision 
systems. Therefore, our work proposes to use a simple and fast architecture for disparity calculation, which is 
based on a SAD algorithm. In this paper, we present an optimized architecture of the SAD algorithm 
implemented in a reconfigurable FPGA. This architecture is based on parallelism processing to accelerate the 
computation time in order to respect the processing time constraints of a stereo vision system. The system 
uses two image inputs (left and right) with a video graphics array (VGA) resolution (640x480 pixels), it is 
generic in terms of window size and disparity range. 
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The remainder of this paper is organized as follows. Section 2 gives an overview of the techniques 
available in the literature for the stereo vision system. The design methodology and the architecture 
developed of the SAD algorithm are presented in section 3. Section 4 describes the details of hardware 
implementation. Finally, section 5 concludes the work. 


2. OVERVIEW OF STEREO VISION SYSTEM 
2.1. Stereo vision algorithm 

The concept of stereo vision algorithms use binocular vision to reconstruct the lost third dimension in 
the images, this technique uses two images of the same scene taken from different view points. The geometry of 
stereo vision system for a parallel camera configuration is based on epipolar geometry [2], which considerably 
limites the search area by reducing it to a single horizontal dimension. The depth perception by stereo vision is 
mainly provided by the matching computation method to find the homologous pixels between the two 
images. This depth perception is based on the square intensity differences (SD) or absolute intensity 
differences (AD), which are the most common methods as described Scharstein et al. [2]. 

The binocular stereo vision system consists of two left and right cameras of the same specification [22], 
the cameras remain on the same plane, which ensures that the horizontal axes of the left and right cameras 
system are on the same line and parallel to the imaging plane [23]. The (1) is used to calculate the distance D 
between the object P and the two cameras, in which (XL, Xp) represent the coordinates of the corresponding 
pixel in the left and right image, and d is the point P disparity. 


d = XR — X, (1) 


The relationship between the distance and the disparity is represented by (2). The distance between 
the cameras and an object is inversely proportional to the disparity. The disparity map contains the disparity 
of each pixel of the original image, and it is calculated by the stereo vision algorithms and used with the 
triangulation for the calculation of the distance to the object. B is the baseline of the stereo camera and F is 
the focal length of the camera. 


BxF 
d 


D = (2) 
The stereo vision algorithm uses a block of correspondence which is the matching pixels process of 
the left and right image that correspond to the same point P projection in real world. A pixel’s block, 
surrounding the pixel under consideration, is used for the matching. This pixels group is called the matching 
window and increases the accuracy of the match because a window provides more accurate description of an 
object instead of a single individual pixel. A matching window in the left image is matched with similar 
windows in the right image using SAD block matching; the matching windows number is limited by the size 
of the disparity range. The matching window in the right image is determined and the difference between 
their horizontal coordinates of the center pixel is considered as the disparity value for that pixel. Similarly, 
the disparity map is created by calculating the disparity for each pixel in the image as illustrate in Figure 1. 


Left and right frame 
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Left window 


Right window 
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SAD calculation 
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disparity range 


Pixel numbre>0 


Disparity map 


Figure 1. The stereo matching algorithm 
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2.2. SAD block matching algorithm 

SAD is a pixel based matching method [24]. Stereo vision system uses the SAD algorithm to 
compare a pixels window in one image with a pixels window in second image to find the corresponding 
pixels. The SAD algorithm calculates the absolute difference between each pair of matching pixels and sums 
all these values to find the SAD valueas shown in Figure 2. 


Left window Right window 
nxn nxn 


SAD value 


Figure 2. Global view of one SAD block 


A SAD value does not give any information about the two corresponding pixels. Many SAD values 
will be calculated from different windows for the reference window, the SAD value with the smallest value 
of all the SAD values is the best matching pixel. Many methods have been defined to calculate the disparity d, 
between the candidate and reference images such as the work Lazaros et al. [25], which proposes to use 
n xn window to scan the image, and they are three main methods available to perform the calculation [26]. 
The first method is based on the SSD algorithm, which is given by (3). 


SSD (x,y,d) = Loyewl@y) = y= d))? (3) 


The normalized cross correlation (NCC) algorithm is the second method to determine the 
correspondence between two windows around a pixel of interest. It is more accurate and less sensitive to 
proportional changes in intensity, but it has a very expensive computational complexity. In (4) expresses the 
formula of the NCC technique. 


Doeyyew LEVRE Y-A) 


NCC (x,y, d) = (4) 


Eanew 2 Eyew REI- 


The last method is the SAD algorithm and its calculation formula is mentioned in (5), it considers 
the absolute difference between the intensity of each pixel in the reference block and the corresponding pixel 
in the target block. (x, y) is the location of each pixel, I, and Ip is the intensity of left and right image pixel. 
The minimum value of the disparity range SAD gives us the location of the best match for the template in the 
source image, however the disparity d can be calculated by (6). 


SAD (x,y, d) = Diese L(x, y) = In y ~ d)| (5) 
d = min (SADs) (6) 


The disparity map is obtained after several iterations through each pixel of the left image, the sum of 
absolute values is calculated for the entire window with the pixel under consideration at the center. The window 
of the right image is then shifted by one pixel to the left and the SAD value is calculated again. This operation 
is repeated until all number of disparities (disparity range) has been analyzed. The resulting disparity value is 
obtained for the disparity level that generated the minimum SAD value. 
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The SAD algorithm has been used in many research studies. Based on simple calculations, this 
algorithm is an appropriate method for a hardware implementation, as SSD has a higher computational 
complexity compared to SAD, because it involves many multiplication operations. NCC is even more complex 
to both SAD and SSD, because it involves many multiplication, division and square root operations [26]. 
The SAD method [27] is the most widely used matching technique in stereo vision algorithms, due to its low 
computational complexity, excellent performance and ease of hardware implementation. 


3. PROPOSED HARDWARE ARCHITECTURE 

For a real time implementation of a stereo vision system, a hardware architecture of the SAD algorithm 
is proposed to achieve the best performance. This proposed architecture uses two separate buffers to store the 
input data of the left and right images with 640x480 resolution to reduce memory access. The window buffers 
store the different lines of the left and right image respectively, and contain the data needed to calculate the (i, j) 
point, this window buffer is generic, we can use n X n as the window size. The hardware design explores 
parallel data processing to achieve real time processing constraints. Figure 3 shows the top level of the 
proposed hardware architecture of the SAD algorithm. 


Left image buffer 


i Right image 


windows buffer Sum of 
Finite state Absolute 
machine Deferent 


Left image computing 
i windows buffer 


Right image buffer 


Figure 3. Hardware block diagram of the proposed architecture 


The main challenge met by this architecture is based on the proposal of the original and parallel SAD 
block, this SAD computation block requires only two basic operations, addition and substraction. The SAD 
algorithm is the simplest metric that considers all the pixels of the block for calculation and also separately, 
which makes its implementation simpler and parallel. As a result, the SAD block is mainly composed of abs/sub 
blocks, an adders and a multiplexer that indicates the disparity d. The multiplexer compares two input SAD 
values, it outputs the minimum SAD value used in the comparison in the next step. The left image window 
and all the right image windows from (i, j) to (i, j + disparity range) are sent to the SAD block which 
simultaneously computes the SAD value which is based on the parallelism processing to speed up the 
computation time of our system, and all these blocks are controlled and synchronized by a finite state 
machine as shown in the Figure 4. 

The SAD between two windows of size 3x3 centered on (i, j) of the left image and on (i, j + disparity 
range 20) of the right image, the 9 absolute difference pixels are calculated in parallel and then summed to 
generate the final disparity value. For each clock cycle, the right window moves to the right, moving a new 
window column. This continues until the maximum disparity is reached, the minimum disparity of 
differences is determined. At this point, the left window moves one window column, while the right window 
is reloaded. We write the disparity d (horizontal distance) to the corresponding pixel in the disparity map for 
each block. Figure 5 shows how we can implement the different steps of the SAD block calculation. 

As shown in Figure 6, the proposed SAD architecture is composed of three absolute difference 
blocks, two adders, and a multiplexer. The absolute difference block calculates the sum of three absolute 
differences of two input data with three pixels. The multiplexer compares two inputs SAD values and 
generates the minimum SAD value used in the comparison in the next step. 
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The window size of the SAD algorithm used to calculate disparity is 3x3 pixels; with a maximum 
disparity range of our design is 0 to 19 pixels. The image rows are read from the left and right images into 
buffers. This is done to speed up the access to the pixels from the next SAD window, because the external 
memory access is very slow compared to the memory located in the FPGA board. The three output rows of 
the image buffer is shifted into windows buffer to allow access to 20 values at the same time to compute the 
minimum SAD value for 20 SAD windows, and to calculate the disparity d between the homologous pixels. 
From that moment, for each clock cycle the right window buffer will shift to the right, moving a new window 
column, instead of re-reading the whole window from the image buffer. This operation continues until the 
maximum disparity has been reached. At this point the left window buffer shifts one window column, while 
the right window buffer is started the calculation of SAD values from this second column position. 


Figure 4. Finite state machine diagram 


Leftimage Rightimage 


3x3 window (i, j) 


SAD Block17 


SAD Block18 


Minimum of the 20 values (disparity d) 


Figure 5. SAD block calculation 
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Figure 6. SAD proposed architecture 


4. EXPERIMENT RESULTS 

The proposed SAD architecture is developed in very high speed integrated circuits hardware 
description language (VHDL) and synthesised using altera Cyclone IV FPGA. Figure 7 shows experimental 
steps for implementation of the proposed reconfigurable SAD system. For verifying the output data of the 
stereoscopic system, various images of stereo matching design are tested based on the middlebury dataset. 


Image L Image R 


i| SDRAM1 Memory SDRAM2 Memory | 
, Greyscale Image 


Proposed SAD architecture 


; Disparity map Trangulation method Depth map : 


Display VGA 7 segment display 


Figure 7. Block diagram of the global system 


The both right and left images are converted to a grayscale images and stocked in the external 
synchronous dynamic random-access memory (SDRAM) memory on the FPGA platform. Triangulation 
method is used to determinate the distance between the object and the stereo system. The disparity map is 
displayed on the VGA monitor with the depth of the different objects. The implementation was done on 
altera Cyclone IV FPGA. Figure 8 show the schematic view of the synthesis result. 

The results of the hardware implementation confirm that the proposed system can accurately generate 
depth maps by using left and right images. Table 1 gives the complete devices usages summary. We can see that 
our design spent 576 logic elements and use only 49152 bits of internal memory. There is one SAD block that 
can process three image lines in parallel. However the performance of the algorithm can be improved with 
more SAD parallel blocks. 

To evaluate the proposed hardware implementation of proposed SAD architecture in terms of the 
device, window size, disparity range, algorithm, frame size used by similar works. The comparison results 
show that the proposed architecture had good performance, and exhibited higher computational efficiency 
compared to the other works. In addition, the parallel operations architecture effectively reduced the 
computation time of the proposed architecture. Processing two stereo images with the 640x480 resolution 
and for a 500 Mhz FPGA frequency gives us approximately 16x10° clock cycles. 

To solve the problem of the extensive processing time generated by the full search based block 
matching of the SAD algorithm, a parallel hardware architecture of the SAD algorithm is proposed. 
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As mentioned above, this proposed parallel hardware architecture reduces the processing time of the full 
search algorithm and meet the reel time requirements. Experimentally, The proposed stereo vision system 
show that our FPGA implementation of the system is very efficient in terms of computational speed up and 
can operates in a real time with 30 frame per second (FPS) compared to the similar works as presented in the 
Table 2. On the other hand, notice that the use of FPGA resources presented in Table 1 make the system very 
flexible and it can be seen that the number of occupied resources on FPGA is reduced in the proposed SAD 
architecture than the works referenced in [28], [29], allowing the designers to implement other algorithms. 
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Figure 8. The RTL schematic view of the proposed architecture 


Table 1. Resources required for implementing the proposed architecture on the Cyclone IV FPGA device 


Resources Used / available / (% of total used) 
Total logic elements 576 / 21280 (3%) 
Total combinational functions 572 / 21280 (3%) 
Dedicated logic registers 367 / 21280 (2%) 
Total pins 60 / 167 (36%) 
Total memory bits 49152 / 774144 (6%) 


Table 2. Comparison with other works 


Reference Device FPS Window size Disparity range Algorithm Frame size 
[30] Altera Cyclone II 30 3x3 32 SAD 640x480 
[31] Altera Cyclone MI 30 5x5 64 SAD 1396x1110 
[32] Altera Stratix IV 46 / 400 25x25 256 / 128 SAD 1280x1024/640x480 
[33] Xilinx Virtex XCV-2000E 31 8x8 64 SAD 1024x768 
[34] Xilinx Virtex 25.6 5x5 255 SAD 512x512 
[28] Altera Cyclone III 30 5x5 64 SAD 1396x1110 

Our work Altera Cyclone IV 30 3x3 20 SAD 640x480 


5. CONCLUSION 

This work proposes real time hardware architecture for a stereo vision system. The design has been 
implemented in VHDL, then synthesized and routed in an altera Cyclone IV FPGA. This new architecture is 
based on parallel data access and processing, and uses a new design of the SAD block to accelerate the massive 
computation time associated with the disparity map generation of the stereo vision system. The experimental 
results show that the speed of our complete stereo vision system can process 30 fps with 640x480 resolution for 
the 20 pixels disparity range architecture. The proposed system can be adopted to support real time 
applications such as automotive and robot navigation. 
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